Automation of finding strong gravitational lenses in the Kilo Degree Survey with U – DenseLens (DenseLens + Segmentation)

ABSTRACT In the context of upcoming large-scale surveys like Euclid, the necessity for the automation of strong lens detection is essential. While existing machine learning pipelines heavily rely on the classification probability (P), this study intends to address the importance of integrating additional metrics, such as Information Content (IC) and the number of pixels above the segmentation threshold (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\rm {\mathit{n}_{s}}$\end{document}), to alleviate the false positive rate in unbalanced data-sets. In this work, we introduce a segmentation algorithm (U-Net) as a supplementary step in the established strong gravitational lens identification pipeline (Denselens), which primarily utilizes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\rm {\mathit{P}_{mean}}$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\rm {IC_{mean}}$\end{document} parameters for the detection and ranking. The results demonstrate that the inclusion of segmentation enables significant reduction of false positives by approximately 25 per cent in the final sample extracted from DenseLens, without compromising the identification of strong lenses. The main objective of this study is to automate the strong lens detection process by integrating these three metrics. To achieve this, a decision tree-based selection process is introduced, applied to the Kilo Degree Survey (KiDS) data. This process involves rank-ordering based on classification scores (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\rm {\mathit{P}_{mean}}$\end{document}), filtering based on Information Content (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\rm {IC_{mean}}$\end{document}), and segmentation score (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\rm {n_{s}}$\end{document}). Additionally, the study presents 14 newly discovered strong lensing candidates identified by the U-Denselens network using the KiDS DR4 data.

The number of galaxy-scale strong lens candidates will increase by three orders of magnitudes with upcoming large-scale sky surveys.Around 10 5 strong lenses are expected to be discovered (e.g.Pawase et al. 2014 ;Serjeant 2014 ;Collett 2015 ) by upcoming large-scale sk y surv e ys such as the Large Synoptic Surv e y Telescope (LSST; Tyson 2002 ), Euclid (Laureijs et al. 2010 ), the Square Kilometer Array (SKA; Dewdney et al. 2009, Koopmans, Browne & Jackson 2004 ;Quinn et al. 2015 ), and the Chinese Space Station Telescope (CSST; Zhan 2018 ).Using human volunteers as classifiers becomes increasingly difficult (next to impossible) with these upcoming surv e ys.Davies ( 2022 ) showed that human classifiers were less successful when compared with the Convolutional Neural Network (hereafter CNN) at classifying strong lenses when subjected to a classification task in a Zooniverse (Simpson, Page & De Roure 2014 ) project.CNNs have also been greatly preferred after showing promising results in the strong gravitational lens finding challenge (Metcalf et al. 2019 ).
A Convolutional Neural Network (CNN; Lecun et al. 1998 ) is an adaptive learning algorithm that learns the features of images utilizing spatial hierarchy through gradient-based backpropagation.Owing to its efficiency, CNNs have been largely preferred over other machine-learning techniques (such as SVM, Random Forests) and e xtensiv ely used in recent research methodologies (Petrillo et al. 2017 ;Lanusse et al. 2018 ;Pearson et al. 2018 ;Pourrahmani;Nayyeri & Cooray 2018 ;Schaefer et al. 2018 ;Davies et al. 2019 ;Metcalf et al. 2019 ;2019a , b ;Ca ˜ nameras et al. 2020 ;Christ et al. 2020 ;Li et al. 2020 ;2021 ;Gentile et al. 2021 ;Rezaei et al. 2022 ) to find strong lenses.Ho we ver, due to the highly unbalanced nature of the data-set and the close resemblance of some classes of nonlenses with lens candidates, a large number of false positives in the final sample cannot be a v oided.In our previous paper (Nagam et al. 2023 ), we introduced the DenseNet architecture as a significant step tow ards mitigating f alse positives.Building upon this foundation, we advocate for an advanced approach to further diminish false positives.
To achieve a further reduction in false positives, we introduce segmentation techniques alongside Convolutional Neural Networks (CNNs).Segmentation is a technique where select pixels of the image are classified into one or many classes.Some of the popular segmentation architectures include Faster R-CNN (Ren et al. 2015 ), Mask R-CNN (He et al. 2017 ), Segnet (Badrinarayanan, Kendall & Cipolla 2017 ), and U-Net (Ronneberger, Fischer & Brox 2015 ).Faster R-CNN has been used in the morphology classification of radio sources (Wu et al. 2018 ), detection of L-Dwarfs (Cao et al. 2023 ), detection and classification of astronomical targets (Jia, Liu & Sun 2020 ), detection of supernovae (Wu 2020 ;Guo et al. 2021 ).Mask RCNN, a successor of Faster R-CNN, has been used in the morphological segmentation of galaxies (Farias et al. 2020 ;Gu et al. 2023 ), to detect, classify and deblend astronomical sources (Burke et al. 2019 ), to detect and classify sources in radio continuum images (Riggi et al. 2023 ) and to detect and mask ghosting and scatteredlight artifacts from optical surv e y images (Tanoglidis et al. 2022 ).Segmentation using U-Net was first proposed by Ronneberger et al. ( 2015 ) for medical image segmentation.Since then, it has been widely used in various fields.In radio astronomy U-Net has been used to classify clean signal and RFI signatures (Akeret et al. 2017 ), automatic recognition of RFI (Long et al. 2019 ).U-net has also been used to segment spiral arms of disc galaxies (Bekki, K. 2021 ) and denoizing astronomical images (Vojtekova et al. 2020 ;Qi et al. 2022 ).
In the case of e xtensiv e surv e ys, such as those being carried out with Euclid, which entail the analysis of millions of candidates, the post-processing results from the Denselens pipeline can still yield thousands of candidates requiring daily v etting.F or e xample, output from Denselens pipeline can still have false positives candidates having features such as arcs, background contamination etc., which can closely resemble strong lensing features.Due to the highly unbalanced nature of the data-sets, where typically one sample out of 1000 samples is a mock lens, the number of false positives getting ended up in final sample can be significant.
Hence, to further reduce these false positives in final sample, we explore a novel idea of using a segmentation algorithm (U-Net) to segment images in to the lensed source pixels of the strong lensing candidates and the 'rest' of the field (other sources in the field and the lens galaxy).Typically, U-Net is fa v oured o v er alternativ e v ersions of R-CNN due to its lighter model structure (fewer parameters), while maintaining comparable efficiency for semantic segmentation tasks (Widyaningrum et al. 2022 ).We use U-Nets in addition to the DenseLens (Nagam et al. 2023 ) network (implemented in our previous paper) to classify strong lenses and to reduce false positives in the final sample.
In Section 2 , we describe the KiDS data-sets used for classification and rank-ordering.In Section 3 , we describe the methodology to se gment source pix els.We e xplain our results in Section 4 and finally we provide our discussion and main conclusion in Section 5 and Section 6 , respectively.

DATA -S E T S
The KiDS is a wide-field optical imaging surv e y operating with a 268 million pixel square CCD mosaic camera (OmegaCAM; Kuijken et al. 2011 )  The KiDS surv e y is the deepest of the three wide area public imaging surv e ys ev er conducted with best observing conditions.KiDS co v ers around 1350 square degrees of e xtragalactic sk y in four filters ( u , g , r , i ).The r -band images have the optimal seeing condition with a median Point Spread Function (PSF) FWHM of < 0.7 arcsec and an exposure time of 1800 s.In this paper, we utilize the data from 904 tiles from KiDS DR4 data release (Kuijken et al. 2019 ).We have used ∼3.8 million r -band cutouts of size 101 × 101 pixels which corresponds to the area of 20 arcseconds × 20 arcseconds.

Data selection
We present a detailed account of the methodology employed in creating KiDS cutouts, outlined below.Our study e xclusiv ely utilizes r -band images, with g , r , i images showcased solely for illustrative purposes.Our approach is similar to the methodology introduced by Li et al. ( 2020 ) and Petrillo et al. ( 2019a ).
1. Bright Galaxy (BG) sample: The objective to create BG sample without any colour cuts is that the colour cuts are arbitrary and the colours of the foreground lens can be contaminated by lensing when the Einstein radius is small (Li et al. 2020 ).We employ two criteria for selecting KiDS cutouts: (i) Setting the parameter SG2DPHOT to 0 to e xclusiv ely target galaxies.SG2DPHOT is a flag generated by the automated tool 2DPHOT (Barbera et al. 2008 ), offering both integrated and surface photometry for galaxies within an image.(ii) Employing SEXTRACTOR (Bertin & Arnouts 1996 ), we generate catalogue using r -band mag auto with the constraint r auto ≤ 21.This yields approximately 3.8 million cutouts.

LRG sample:
We focus on selecting Luminous Red Galaxies (LRGs) with redshifts ( z) less than 0.4 (Petrillo et al. 2019a ).This involves isolating areas in ( r -i ) and ( g -r ) colour diagrams based on the following criteria: This selection criterion results in approximately 126 000 LRG cutouts.

M E T H O D O L O G Y
In our prior research (Nagam et al. 2023 ), we introduced a pioneering approach for the classification and rank-ordering of strong gravitational lenses, called DenseLens.To further enhance accuracy and reduce false positives, we introduce an integrated approach, U-DenseLens, which combines DenseLens with a U-Net segmentation network.The integration of DenseLens and U-Net segmentation aims to refine our model's accuracy in identifying strong lens candidates.In Section 3.1 , we briefly introduce DenseLens and in Section 3.2 , we delve into the application of U-Net for pixel segmentation within input lens candidate images, providing a detailed methodology for training and classification.

DenseLens
Using DenseLens (Nagam et al. 2023 ), we demonstrated the application of classification and regression ensemble pipeline for the purpose of classifying and rank-ordering strong lenses.Upon providing an input image to the DenseLens algorithm, four densely connected networks generate classification scores within the range of 0-1.The mean of these scores ( P mean ) is subsequently computed.We select the candidates with P mean values abo v e a relatively large designated threshold ( P thres ).DenseLens also uses a metric called the Information content (IC), which aids in ranking images based on the number of resolution elements in noise-less mock lensed images abo v e a brightness threshold, relativ e to background noise ( σ ).It scales with the ratio of this area ( A src , 2 σ ) in units of the PSF area ( A P SF ), multiplied by the ratio ( R ) of the Einstein radius ( R E ) o v er the ef fecti v e source radius ( R eff ) e xplained in equation ( 2), preventing a high IC value for lenses with a very large source but a small Einstein radius (which would be hard to identify as lens).The images that pass the P -value thresholds are inserted into regression networks which are trained to predict IC values.The resultant mean of the outputs from these regression networks is denoted as IC mean .The filtered candidates are then systematically rank-ordered based on the computed IC mean values.Hereafter, the terms P mean and IC mean are written as P and IC for simplicity.

Segmentation
To further reduce false positive rates, we introduce an additional segmentation network at the end of the DenseNet pipeline.This augmented configuration, comprising DenseLens and the U-Net segmentation network, is termed as U-DenseLens .This integrated approach aims to further refine the accuracy of our model by leveraging the capabilities of U-Net segmentation in the identification of strong lens candidates.In this study, we employ U-Net for finding source pixels within input lens candidate images.Specifically, for lenses, our training approach designates all source pixels as 1 and other pixels as 0. Conversely, for non-lenses, all pixels are trained with a label of 0. A detailed explanation of this methodology is provided in Appendix A .
The U-Net architecture used in this paper is illustrated in Fig. A2 .We employ interpolation techniques to resize the initial 101 × 101 pixel input image to a more suitable 256 × 256 pixel configuration.This resizing step is crucial as the U-Net architecture requires the input image size to be divisible by 32, and the dimensions of the bottom layer should not be e xcessiv ely small.F or instance, opting for a scaled input size of 256 × 256 in the first layer results in a layer size of 16 × 16 in the middle layer.Striking a balance is essential, as selecting a larger input size increases computational complexity, while opting for a lower input size may compromise information flow among the middle layers.The resulting output from the U-Net model is down-sampled again to 101 × 101 pixels, with each pixel exhibiting values in the range of 0-1.A notable modification involves employing a sigmoid acti v ation function (Narayan 1997 ) for the final layer.This modification ensures that each pixel obtained as output in the final layer possesses values within the range of 0-1.To classify source pixels, we set the segmentation threshold ( S thres ) to 0.6.We determined this threshold value through e xtensiv e e xperimentation.Intuitively, setting the threshold too low would result in selecting pixels for which the U-Net lacks confidence.Conversely, a high threshold could lead to a multitude of candidates with minimal pixels in the segmentation output.Therefore, we opted for a moderate threshold value, striking a balance.As a third metric we use n s , the total amount of classified source pixels above the segmentation threshold ( S thres ).
We experimented with mock lenses to inv estigate div erse threshold values to categorize candidates.This classification included easily identifiable lenses (green), candidates devoid of source pixels (red), and those falling within intermediate classifications (yellow).This leads us to define the following scheme: n s ≥ 40 green n s > 0 and n s < 40 yellow n s = 0 red The values n s , P , and IC are utilized to classify and establish a rank order for strong gravitational lenses (using a decision tree; see Section 4.2 ).By combining multiple metrics, our approach aims to enhance the robustness and accuracy of the classification and ranking process.

R E S U LT S
In this section, We systematically generated mock data by combining simulated lensed sources with the LRGs from KiDS (Petrillo et al. 2017 ) and we have applied our U-Denselens model.We explain the results in Section 4.1 .We apply our network and we develop a decision tree based automation technique to BG sample showing the results in Section 4.2 .This selection is validated through a voting mechanism with human classifiers, emphasizing the agreement of decision tree results with human classifier votes.Our results demonstrate the efficiency of the proposed approach, offering insights into the reduction of false positives without compromising genuine strong lensing candidates.Additionally, we extended our analysis to the LRG sample in Section 4.3 , revealing the versatility and generalizability of our decision tree in optimizing the selection process for diverse data-sets.

Mock data
We generated the mock data as detailed by Nagam et al. ( 2023 ), consisting of 10 000 lenses and 10 000 non-lenses.We aim to investigate the impact of segmentation on both classes of mock samples.To achieve this, we deliberately set the threshold parameter ( P thres ) at a low value of 0.3.This choice was made to minimize the extent of filtering applied during the classification prediction ( P ), allowing us to observe the effects of segmentation on the mock sample classes.Following classification with this very low threshold score, we identified 9586 candidates among 10 000 mock lenses as lens candidates and 875 candidates as non-lens candidates.The distribution of candidates is detailed in  of the lens candidates share the same categorization.This implies that by excluding candidates in the Red category, we can eliminate 77 per cent of false positives, at the cost of only discarding 14 per cent of true lenses.In an unbalanced data set where often only one in a thousand galaxies is a genuine gravitational lens, this drastically reduces the false positive over true positive rate (by about a factor of four).We have also shown the top 25 rank-ordered candidates from the mock data and its respectiv e se gmentation maps in Fig. B1 (top) and (bottom) respectively.

Gener al approac h
To optimally combine the three metrics ( P , IC, and n s ) and assess their respective impact on the identification, We have devised a decision tree to reduce the false positives in the final sample (see the flow diagram in Fig. 1 ).Our decision tree for the selection of strong gravitational lenses consists of a number of steps and selection criteria: (i) Rank-order candidates based on P: We initiate the selection process by rank-ordering candidates according to their classification scores (P).
(ii) Filtering candidates with IC ≤ 50: The IC quantifies the ranking of images by considering the resolution elements in noiseless mock lensed images abo v e a brightness threshold, with scaling factors based on the ratio of Einstein radius ( R E ) to the ef fecti ve source radius ( R eff ) aiming to provide higher IC values for easily MNRAS 533, 1426-1441 (2024)  D1 .
recognizable lenses.Candidates characterized by an IC less than or equal to 50 are filtered out in this step.This strategic filtering step is crucial to exclude candidates exhibiting thick blob like structures.Fig. 2 illustrates the 12 highest ranked candidates that fall within this category .Notably , these candidates often present challenges in differentiation due to presence of dense central structures, making them indistinguishable as lenses (except for the first candidate).Despite their high P values and elevated ranks, the necessity to eliminate candidates with IC values less than or equal to 50 is apparent.We note that various test have shown the results to be relatively robust against changes in this value.(iii) Remo v e candidates with n s = 0 and IC ≤ 104: Subsequent to the ICbased filtration, candidates with no classified source pixels ( n s = 0) and IC values less than or equal to 104 are remo v ed.We discuss how we arrived at this value shortly later in the section.This additional step ensures a refined selection process by removing the candidates that are completely rejected by the U-Net segmentation algorithm.

Visual inspection and classification
Throughout this paper, we use the term 'Human classifiers' to collectively refer to the eight authors involved in this study.This group w as task ed with e v aluating the top thousand candidates selected the BG sample as they had high P value ( ∼P ≥ 0.95), and were rankordered solely based on the value of P .Each human classifier voted on all candidates, utilizing a set of pre-defined options.Notably, each option is associated with a corresponding score as detailed below.To these four categories, we assign a weight of 1.0, 0.7, 0.3, and 0.0, respectively.
The distribution of candidates in red, yellow, green categories (as defined in Section 3.2 ) yielded 306, 488, 206 candidates, respectively.Thus we could argue that if we remo v e these 306 candidates with n s = 0 (red) as false positives, we can potentially reduce the false positives in the final sample by 30 per cent without human culling.Ho we ver, prior to this removal, it is necessary to maximize the exclusion of only false positives and not genuine candidates.Consequently, we undertook a validation process, automating the identification of strong lenses through a voting mechanism involving human classifiers.
The top 1000 candidates were presented to human classifiers in a randomized, label-free manner to eliminate any potential bias.The voting results, plotted against P , are depicted in Fig. 3 .Candidates selected by obtaining votes of either 'lens' (a weight of 1) or 'maybe lens' (a weight of 0.7) from four or more human classifiers are marked as triangles and will be referred as democratically elected samples throughout this paper.There were 306 candidates with n s = 0.If we define positive samples as the candidates having the mean of human classifier votes ( s m ) > 0.5 and then the reminder as ne gativ e samples, then the 306 candidates having n s = 0 (red) split into 19 positive samples and 287 negative samples.Hence, rejecting n s = 0 (red) candidates, enables us to eliminate an additional 287 false positives in final sample.But before we include this selection step, we want to identify how many out of the 19 positive samples can be retained based on an additional selection.In the Fig. 4 , we have plotted the IC values against the percentages of TP and FP.The figure describes how many TP (out of 19 positive candidates) and FP (out of 287 ne gativ e candidates) passed the IC threshold ranging from lowest IC (50) to the maximum IC of all candidates.We find that at IC = 104, we can make a trade off as it retains approximately 60 per cent of true positives candidates (11 out of 19) and also lowers the false positives to approximately 9 per cent (27 out of 287).We define these candidates as blue candidates which fall in the IC > 104 range, despite n s = 0.The top six of these blue candidates are illustrated in the Fig. 5 .The top 6 candidates of the other regimes, namely yellow (0 < n s < 40), green ( n s ≥40), and red ( n s = 0) are also shown in the same figure.We have also shown the distribution of TP and FP in the BG sample with respect to the mean of human classifier votes in Table 2 .
Instead of arbitrarily defining positive samples as s m > 0.5, we could also define positive samples as democratically elected samples that received a majority of human votes (at least 4 out of 8 people) voted as maybe lenses (0.7) or as sure lenses (1).When removing all red candidates in the top 1000 candidates from the BG sample, we recognized a potential loss of 16 democratically selected red candidates.The total count of these democratically elected candidates in Fig. 3 (marked by triangles) is 169, split in to 16 red, 15 blue, 60 green, and 78 yellow candidates.The proposed removal strategy would result in only an ∼ 10 per ce n t loss, specifically sixteen red candidates.Therefore, our validation of segmentation algorithm, based on the votes from human classifiers, underscores that eliminating red candidates could significantly reduce the false  positives in the final sample by a quarter, while loosing considerably fewer strong lensing candidates.
We also have found 14 strong lensing candidates in the BG sample which have not been previously discovered before.These candidates ha ve been v oted as lens or may-be lens by four or more human classifiers.The candidates are shown in Fig. 9 .

Random forest analysis
A critical analysis was performed to discern the primary contributor to decision-making among the three metrics (P, IC, n s ).The determination of feature importance, is done through a Random Forest model (Breiman 2001 ), by quantifying the reduction in Gini impurity (Gini 1921 ) between parent and child nodes, in the decision tree as presented in this work.Detailed computations are provided in Appendix.E .A higher reduction in Gini impurity signifies greater importance.
In our analysis, we only use the P , IC, and n s values of the top 1000 candidates (rank-ordered based on the P value) from the BG sample as input features, given that only those were also given to human classifiers.The output values, being the mean of human voting results MNRAS 533, 1426-1441 (2024)  D2 .
( s m ) rounded to the nearest integer value (0 or 1), were employed for training a random forest comprising 100 000 decision trees.The resulting feature importance are 37.5, 41.0, and 21.5 per cent for P, IC, and ( n s ), respectively.The feature importance analysis shows that all P and IC have very similar importance, but the ( n s ) still considerably contributes to the final selection albeit with less weight than P and IC.

LRG sample
We repeated the analysis with the KiDS-LRG sample, but performed the voting with the human classifiers only for the top-200 candidates (sorted based on P ) out of ∼126 000 candidates.
We carried out the voting experiment for the LRG sample against the results from the decision tree implemented in Section 4.2 .The mean of human classifier votes s m is plotted against P .This is shown in Fig. 6 .There were 9, 116, and 75 candidates in green, yellow, and red re gimes, respectiv ely, shown in the Table 3 .There are also 8 true positives and 67 false positives in the n s = 0 (red) regime.If we apply the same decision tree (shown in Fig. 1 ) and by putting a IC threshold of 104 in the final step, we can retain 5 out of 8 TP (60 per cent) at the expense of 14 false positives from the n s = 0 regime.These candidates fall into the 'blue' regime.By removing the 'red' candidates, we remo v e an additional 56 out of 200 false positives in the final sample reducing approximately a quarter ( ∼ 25 per cent ), by combining the results from P , IC , and n s in the LRG sample.
Using the similar definition applied to the BG sample, now TP's in LRG samples again can be defined as the candidates selected as democratically elected samples .The democratically elected samples are shown as triangle shaped markers in Fig. 6 .We see that only three of the red candidates are present in the democratically elected regime.There are only six blues, four green, and twenty one yellow candidates in the human-classifier in the democratically elected regime which are shown in Fig. 7 as triangle shaped markers.If we define true positives as the candidates belonging to democratically elected regime, then there are 34 TPs.By removing red candidates, we will only lose 3 out of 34 TPs ( ∼ 10 per cent ).Thus again we pro v e the validation of the se gmentation algorithm and we can conclude that eliminating 'red' candidates can significantly reduce false positives while loosing only fewer strong lensing candidates.The standard deviation for many candidates are high showing high disagreement among voters for certain candidates.The candidates having highest standard deviation is shown in Fig. 8 and their details in Table D4 .This shows that these candidates show features that do not convince all of the voters of it being a genuine lens.

D I S C U S S I O N
The primary objective of this paper has been to find an algorithm capable of classifying strong lenses without requiring human vet- ting, particularly in very large surveys such as the wide survey carried out with Euclid, which could potentially comprise several hundred thousand strong lensing candidates (Collett 2015 ).Previous approaches in strong lens classification have predominantly relied on the classification probability (P).Nagam et al. ( 2023 ) introduced DenseLens and the concept of combining P and IC to refine candidate selection.After filtering based on P , we rank-order candidates using IC.In this work, we introduced segmentation as an additional metric to impro v e our ability to differentiate final candidates based on whether they contain plausible lensed features.
We propose the idea of considering the number of pixels ( n s ) abo v e a segmentation threshold ( > 0.6) as an additional metric to reduce false positi ves.Ho we ver, the v alues of P , IC, and n s are not independent.Hence, we employ a decision tree to combine them.To ensure that the decision tree does not discard too many strong lens candidates during the sample size reduction via segmentation, we set the selection criteria based on voting by human classifiers.
We find that the retention of candidates with n s = 0, when they have a value IC > 104, ensure the inclusion of highly-plausible lenses without increase the false positive rate significantly.The rationale behind this choice lies in the fact that not all variations of strong lenses are encompassed in the training data-set used for segmentation algorithms, as illustrated that a low value of n s and a high value of IC are in contradiction since the former suggest there are no lens features (or the features are seen to be from non-lenses) and the latter suggests the opposite.For instance, our segmentation training focused on candidates featuring a single lens in the foreground.Ho we v er, when e xamining'b1' in Fig. 5 , it becomes evident that this specific candidate exhibits multiple foreground strong lenses, and larger Einstein radius than that is expected for a single lens galaxy .Consequently , the combination of information content (IC) and the segmentation threshold n s addresses the mismatch between the training set and real data (leading to the tension between n s and IC in some cases).This approach recognizes the complexity of strong lensing scenarios, especially when deviations from the training set parameters are encountered, and underscores the need for a comprehensi ve e v aluation that considers both IC values and segmentation thresholds.This tension could be alleviated by training the network on more complex lensing scenarios, but that is outside the scope of this work.Segmentation algorithm, when used with other network architectures involved in automated searches, can be  D3 .
8. LRG candidates with high standard deviation shown in Fig. 6 .Their details including standard deviation values s std are shown in Table D4 .D6 .beneficial.Ho we v er, quantifying the e xtent of their impact would also be beyond the scope of this paper.Although the decision tree was tailored for the BG we apply the decision tree to an LRG sample, obtaining similar results.
Whereas segmentation can help select genuine lenses, subtracting foreground lens light also significantly aids lens modeling (Nightingale, Dye & Massey 2018 ; Etherington et al. 2022 ).Previous studies, such as by 'Pearson, Li & Dye ( 2019 )', have shown a 34 per cent average increase in accuracy of predicted lens model parameters by remo ving fore ground lens light.

C O N C L U S I O N
In this work, we have introduced a segmentation algorithm (U-Net) to aid in reducing false positives when searching for galaxy-scale strong lenses in large surv e ys.We add this U-net algorithm to our previous classifier neural network (Nagam et al. 2023 ) which primarily used P and IC to detect and rank-order strong lenses.We illustrate its ef fecti veness by applying it to a sample of galaxies from the Kilo-De gree Surv e y.
We generate a mock data-set (Petrillo et al. 2017 ) of 10 000 mock lens and 10 000 non-lens instances and we applied a classification threshold ( P thres > 0.3), resulting in the identification of 9586 mock lens candidates and 875 non-lens candidates.Analysing the distribution of candidates, especially the impact of the 'Red' candidates ( n s = 0 ), which could eliminate 77 per cent of false positives at the cost of discarding 14 per cent of true lenses.This highlights the importance of segmentation results, to significantly increase the purity of the final sample of strong lenses.
The final decision tree for the selection of strong gravitational lenses, for the Bright Galaxy (BG) sample from KiDS, involves rank-ordering candidates based on their classification scores ( P ), filtering candidates with Information Content (IC) less than or equal to 50, and removing candidates with zero classified source pixels ( n s ) and IC values less than or equal to 104.The subsequent human classifier validation process further refines the selection, revealing that eliminating n s = 0 candidates can significantly reduce false positives by only losing considerably fewer confirmable strong lensing candidates.We present fourteen new strong lensing candidates which were disco v ered with U-Denselens and validated by four or more human classifiers as lens or may-be lens.The extension of the classifier to the Luminous Red Galaxy (LRG) sample confirms the decision tree's ef fecti veness, demonstrating a reduction in false positives by a quarter.The incorporation of human classifiers in the MNRAS 533, 1426-1441 (2024) validation process ensures the preservation of genuine candidates while enhancing the reliability of the selection.
Looking ahead, our study suggests potential avenues for impro v ement, such as enhancing the realism of training data, incorporating additional lensing types, and expanding the negative data base.Although the classifier is fine-tuned for KiDS r -band data, we expect the proposed decision tree to be a robust framework for automation of finding strong gravitational lenses in the upcoming large-scale astronomical surv e ys e.g, those carried out with Euclid.

AC K N OW L E D G E M E N T S
We would like to thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high performance computing cluster.The research for this paper was funded by the Centre for Data Science and Systems Complexity at the University of Groningen ( www.rug.nl/research/fse/ themes/dssc/ ).We would also thank the colloborators Yue-Dong and Rui Li for creating a website to vote for the lensing candidates.CT and VB acknowledge the INAF grant 2022 LEMON.

A P P E N D I X A : S E G M E N TAT I O N T R A I N I N G
The U-Net algorithm is trained for positive candidates (lenses) with source pixels as 1 and the rest of the pixels as 0. This is explained in Fig. A1 mounted on the VL T -Surv e y Telescope (VST; Capaccioli & Schipani 2011 ) at ESO's Paranal observatory in Chile.

Figure 2 .
Figure 2. High ranked candidates in the KiDS-BG sample (based on P) with IC ≤50.Such candidates (including the candidates shown) having IC ≤50 were remo v ed before given to the human classifiers for voting.Their details are shown in TableD1.

Figure 3 .
Figure 3. Mean of human classifier votes ( s m ) for all the 1000 candidates from the KiDS-BG sample plotted against P .The 1000 candidates are se gre gated into four plots (blue, green, red, and yellow) based on their observed n s regime.Candidates which are democratically voted as May be lens or Sure lens by four or more voters is indicated by triangle shaped marker.

Figure 4 .
Figure 4. Percentage of TP (blue continuous line) and FP (orange continuous line) reco v ered for giv en range of IC values.At an IC value of 104, we reco v er as many as 60 per cent of true positives.

Figure 5 .
Figure 5.The top six candidates based on s m in each regime from Fig. 3 are shown.Top row: The top six candidates ranked based on s m with n s = 0 but have IC > 104 shown as blue triangles in Fig. 3 (top-left).Second, third, and fourth row: Also shown are the top six candidates ranked based on s m that have 0 < n s < 40, n s ≥40 and n s = 0, respectively, are shown as yellow, green, and red triangles, respectively in Fig. 3 .Their details are shown in TableD2.

Figure 6 .
Figure 6.Mean of human classifier votes ( s m ) for all the 200 candidates from KiDS-LRG sample plotted against P .The 200 candidates are se gre gated into four plots (blue, green, red, and yellow) based on their observed n s regime.Candidates that are democratically voted as May be lens or Sure lens by four or more voters is indicated by triangle shaped marker.

Figure 9 .
Figure 9. Fourteen new strong lensing candidates discovered in BG sample which have been agreed by four or more human classifiers as lens or may be lens.The details are shown in TableD5.

Figure 10 .
Figure 10.Top 2 rows: Top 12 candidates from BG sample rank-ordered based on P .Bottom 2 rows: Segmentation maps of the corresponding top 12 candidates.We have shown the candidates with n s scores ≥ 40 with the border green and the candidates between n s scores > 0 and n s scores < 40 with the image border yellow.The details are shown in TableD6.
Figure A1.The training of lenses and non-lenses for the U-Net algorithm.For training, the source pixels are labelled as 1 and rest of the pixels are labelled as 0 (top).For non-lenses, all the pixels are labelled as 0 for the training (below).

Figure A2 .
Figure A2.Description of UNet architecture used in this paper.Thus the final output obtained from the U-net model has 101 × 101 pixels with each pixel varying between 0 and 1.In our model, we resize our input image of 101 × 101 pixels into the shape of 256 × 256 pixels through interpolation.Then the image is passed through the U-Net architecture.

Figure B1 .
Figure B1.Illustration of DenseLens results for mock data.Top: Classification prediction scores ( P ), IC (E), total number of classified source pixels ( n s ) for first 25 candidates ran-ordered by P value.Bottom: The bottom plot shows the segmentation results for the respective candidates shown abo v e.

Table 1 .
Comparison of segmentation scores ( n s ) for Lens and Non-Lens samples present in the mock-data.
Table 1 (a) for lenses and Table 1 (b) for non-lenses.Notably, Table 1 (b) shows that 77 per cent of non-lens candidates with P > P thres fall into the Red category ( n s = 0), in other words segmentation does not classify a single pixels as being a lens feature.In contrast, only 14 per cent

Table 2 .
Distribution of TP and FP in the BG sample with respect to the mean of human classifier votes s m .
Figure 1.Flow diagram for the selection of strong gravitational lenses explained in Section 4.2 .

Table 3 .
Distribution of TP and FP in the LRG sample with respect to the mean of human classifier votes s m .